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Abstract 


The primary goal of this whitepaper is to elucidate the key features 
and capabilities of the Einstein Discovery machine learning (ML) product 
- specifically for readers who want to understand why it occupies a unique 
niche in the machine learning and predictive modeling marketplace. The 
intended audience for this document ranges broadly from business users 
and executives, to analysts, and of course, to data scientists and machine 
learning professionals. This whitepaper is specifically intended for readers 
who want to understand why Einstein Discovery is different from other 
ML and analytics tools, and why it might matter to their organization. 

Please note, while this whitepaper does cover many fundamental aspects 
of Einstein Discovery, it is not intended to be a comprehensive tutorial 
nor a general introductory training on Einstein Discovery. That task has 
been adequately addressed by many talented authors and bloggers. We 
encourage you to explore the learning resources listed in the appendix to 
gain basic proficiency with Einstein Discovery functionality 

The approach we take in this whitepaper is to explore the differentiating 
capabilities of Einstein Discovery by examining its four key layers: 


e Business Application Layer, covered in Section 2 
e Data Science Layer, covered in Section 3 

e Data Platform Layer, covered in Section 4 

e ML Operations, covered in Section 5 


These layers delineate levels of logical separation and, in some cases, 
physical separation. They are analogous to other common technology 
stacks (e.g. OSI network, full stack). 


Figure 1: Einstein Discovery Layers 
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1 Introduction 


Salesforce Einstein Discovery fills a unique niche in the crowded machine learning 
market by providing a compelling ”power to the people” strategy. It embraces 
individuals with little or no statistical or data science training, while simul- 
taneously bringing many benefits to trained data professionals. It provides 
world-class, supervised machine learning in a fully declarative environment. It 
creates highly explainable and explorable data analysis that speaks to a broad 
spectrum of potential users, which include: 


e Business Analysts and Power Users 


e Data Professionals, including Data Scientists and Machine Learning Prac- 
titioners 


Salesforce Users and Admins 


Business Managers and Executives 


Application Developers 


While Einstein Discovery is capable of tackling a wide array of machine learn- 
ing challenges, the majority of the real-world efforts performed with Einstein 
Discovery are focused on increasing transparency into business data problems, 
creating intuitive models, and operationalizing predictions and actionable insights 
into the hands of business decision makers. Performing this level of analysis with 
the traditional ML pipeline approach is non-trivial, complex, time-consuming, 
and expensive. Adopters of Einstein Discovery are keenly interested in avoiding 


the lengthy development time, resource burdens, and hidden costs of building 
and deploying traditional custom predictive models. All models are initially 
at least somewhat imperfect. In many instances, getting a model that’s ” close 
enough” into production quickly in front of business users, and improving busi- 
ness outcomes is where Einstein Discovery delivers major return on investment. 
A clicks-not-code prototyping environment expedites analysis, streamlines model 
tuning, and realizes faster benefits in production. Einstein Discovery facilitates 
a rapid, iterative, revise-and-redeploy approach that engages operational feed- 
back (prediction accuracy) to deliver a solution that evolves with continuous 
improvement. 


For data scientists, machine learning practitioners, and other statistical/data 
professionals, it is important to note that the role they will play in an Einstein 
Discovery solution deployment is more important than ever. Although their 
involvement may differ from their usual engagement, it’s still critically important 
that they bring their expertise to bear in these projects. An example from 
the author’s direct experience is a large, multi-thousand user CRM Analytics 
environment in which the goal was to create prediction scores that would be 
tightly integrated into opportunities and leads within their CRM application. 
While the IT admin team was able to facilitate much of the data gathering, 
configuration, analysis, deployment, etc., the model validation portion of the 
process was among the most critical keys to success. The data science team 
was heavily involved in feature engineering discussions, collaboration with the 
business experts on the use case, and validating the model performance. A critical 
contribution of the data science team involved an expert, in-depth comparison 
of the Einstein Discovery model with a more traditional, coded solution. The 
performance, accuracy, and efficacy of the model turned out to be so close to 
the custom code solution that the difference was deemed irrelevant and not 
worth further scrutiny. In cases like these, the data science team is able to 
leverage Einstein Discovery as an enabling framework to operationalize and 
deploy models into large-scale production. That ”last mile” implementation 
phase of data science solutions is often the hardest, most expensive, and (for data 
scientists) the least appealing part of the process. Einstein Discovery simplifies 
and streamlines the deployment portion of the data science pipeline, whether 
it’s for 10 users or 10,000. 


Einstein Discovery began its life as a Silicon Valley startup named Beyondcore 
in 2004. The founders were inspired by the idea that “citizen data scientists” 
should be empowered to participate in the new wave of machine learning and AI 
for business use cases. The lineage of the firm provides relevant context for this 
paper because its founders showed incredible foresight by building a world-class 
ML product that was aimed at the masses, accompanied by the unique patented 
technologies that sprang from this vision. In 2016, Beyondcore was purchased 
by Salesforce, primarily as a vehicle to add predictive and prescriptive machine 
learning to its already successful Wave Analytics Cloud (since re-branded as CRM 


Analytics). In 2018, Beyondcore officially became Einstein Discovery, and today 
rapid innovation continues with an entire team at Salesforce dedicated to its 
development and expansion. One area of recent significant investment has been 
to deepen native integration with Tableau, the world’s leading data visualization 
company that Salesforce acquired in 2019. New integration capabilities allow 
Tableau users to get predictions and improvements in their Tableau data and 
visualizations by leveraging Al-powered Einstein Discovery models with clicks, 
not code. 


2 Unique Einstein Discovery Capabilities at the 
Business Application Layer 


Summary The business application layer encompasses the interface with which 
the user interacts with the software - whether they are analyzing and exploring 
data, building and managing models, or consuming predictive and prescriptive 
insights in business applications. Einstein Discovery has unique strengths in the 
following key areas: 


e Building Models 


Understanding Data 
e Speed and Iteration 
e Augmented Intelligence 


Workload Prioritization 


e Automation 


e Actionability 
Model Building 


Einstein Discovery has a unique interface for ease of model creation that makes 
no assumptions about data science expertise. A business analyst, for example, 
can create a ”story” in Einstein Discovery simply by choosing a critical business 
metric they want to maximize or minimize. Once they’ve initiated the story 
building process, a graphical wizard interface walks them through the steps to 
choose which features to include in the model, add filters, apply transformations, 
review frequency tables, and so on. 


Figure 2: Model Creation Wizard - Selecting an Outcome Variable 
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At the completion of this simple graphical wizard, Einstein Discovery executes 
(behind the scenes) a regression or tree-based analysis to create an ML model 
and generate a collection of insights, complete with visualizations and natural 
language-based explanations. The story is a key asset. In rich narrative form, it 
helps highlight statistical correlations between a business-relevant metric (an 
outcome variable, such as a KPI) and the explanatory variables that are potential 
influencers of that KPI. 

The story metaphor provides a navigable user interface with insights that 
allow the user to discover and diagnose patterns in the data using component- 
driver analysis, asking ”what if’ style questions, and, of course, reviewing 
predictions and prescriptions in the model. All of that rich functionality is 
accomplished via a simple graphical wizard that requires no coding, and no 
knowledge of machine learning mechanisms. This relatable framework allows the 
user to deeply understand the data with the full range of analytics capabilities - 
Descriptive, Diagnostic, Predictive, and Prescriptive. 


Data Understanding 


After Einstein Discovery has analyzed the data, generated a story, and created 
a model, the GUI presents the user with a series of insights. Each insight 
is represented with visualizations and natural language explanations. If the 
outcome (dependent) variable is numeric, a T-test statistical test is performed 
to determine which insights are most correlated with the outcome variable. 


Likewise, if the outcome variable is a binary classification (yes/no, pass/fail, 
churn/not churn), then a Chi-squared test is performed. The descriptive insights 
in the resulting Einstein story are sorted according to statistical significance in 
the model. In other words, the first chart has the highest R? value relative to 
the outcome variable. The story insights also list the predictor variables with the 
strongest correlations relative to the outcome in the model. (Note that the R? 
values for all coefficients in the model are available in the model metrics tab 


— 


Figure 3: Einstein Discovery Story Insights 
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Some of the descriptive bar charts are single variable representations, which 
are ranked purely by statistical significance. As the user scrolls further into the 
story, they encounter charts where two variables have been combined in order 
to deepen their understanding of the data. For these two-field (multivariate) 
barcharts, Einstein Discovery uses a technique that calculates the adjusted R? of 
the coefficient for each combination of values. This powerful mechanism means 
that barcharts with two variables are sorted by deep interaction effects that could 
not have been understood statistically by looking only at bivariate barcharts. 


Einstein Discovery provides a predictive ” what if” functionality by allowing 
the analyst to simulate prediction scores interactively. In addition to delivering 
a prediction score, the interface also shows top explanatory factors and ways to 
improve the predicted outcome. This capability allows the user to interactively 
simulate predictions for groups/categories within the model features. 


Figure 4: Einstein Discovery Predictions 
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Communicating about the data, with visualizations and automatic expla- 
nations of how individual data elements are interrelated, has always been at 
the forefront of Einstein Discovery’s design philosophy. The following text is 
pulled directly from Beyondcore Patent 9,135,286 filed in 2015: ” Variable value 
combinations that are predominant drivers of key observations are automatically 
determined from several competing variable value combinations. The identified 
variable value combinations can then be then used to predict future trends un- 
derlying the business intelligence data. In another embodiment, an observed 
outcome is decomposed into multiple contributing drivers and the impact of each 
of the contributing drivers can be analyzed and numerically quantified-as a static 
snapshot or as a time-varying evolution. Similarly, differences in observations 
between two groups can be decomposed into multiple contributing sub-groups for 
each of the groups and pairwise differences among sub-groups can be quantified 
and analyzed.” 


Speed and Iteration A common challenge for many organizations is that - 
once built - a well-crafted model can be difficult to implement into production 
environments and integrate seamlessly with the operations they are designed 
to benefit. A key differentiator in the Einstein Discovery value proposition 
addresses the total quantity of human effort that’s required for the creation, 
deployment, and operational integration of a typical machine learning model. 
Consider the aggregate amount of time it takes for a typical data scientist to 
go from a raw dataset to a fully trained ML model that derives descriptive, 
diagnostic, predictive, and prescriptive insights from the data. When surveying 
the spectrum of available ML tools, one discovers significant variance in the total 


time required to accomplish these objectives, ranging from hours to weeks or 
months. Einstein Discovery minimizes the time for model creation, deployment, 
and tuning, thus increasing business value in cases where ”time to market” rapid 
delivery is critical. 


Augmented Intelligence Most ML products exist in a ”data vacuum”. 
Datasets are typically extracted, wrangled, and stored solely for the purpose of 
model creation. Einstein Discovery is fully integrated with the CRM Analytics 
data platform, which is intrinsically part of the larger Salesforce SaaS environ- 
ment. Einstein Discovery enjoys extensive, native integration capabilities that go 
well beyond simply producing models. Model output can augment intelligence 
across a variety of mechanisms. Users can mash up descriptive, predictive, and 
prescriptive types of data assets within the same platform, all derived from the 
same datasets. This comprehensive framework allows admins and developers to 
rapidly create highly useful apps that deliver the full spectrum of data insights 
and embed them strategically in operational workflows. 


Figure 5: CRM Application Augmented with CRM Analytics Visualization and 
Einstein Discovery Predictions/Prescriptions 


NS Sales Home Dashboards v Leads w Opportunities v Accounts Einstein Analytics for Sales Reps Einstein Analytles or Sales Manag... Einstein Analytics for Sales Leaders Contacts v Tasks v More Y d 


Opportunity 
=] NiSource 78 + Follow Edit Clone Update Opportunity w 


Close Date Amount Opportunity Owner 
4/15/2020 $115,000.00 & Amanda Milis 


Open Pipe Product Mix 


$115.0K ad §203.3k 


Details Related Quip. 


Prioritization Einstein Discovery insights and models help users prioritize 
and focus their efforts on the areas where they can most favorably affect business 
goals (e.g. which of my customers are most likely to churn, and what can I 
do about it?). Key business goals and metrics almost always boil down to 
minimizing or maximizing something, such as ” maximize cross-sell”, ” minimize 
customer attrition”, ”maximize lead conversion”, and so forth. The simple but 
powerful idea of maximizing or minimizing a key outcome variable (business 
goal) is fundamental to the way Einstein Discovery is designed. Its purpose- 
built to support rapid creation and iteration of machine learning models, and 
then provide predictive scores (with explanations) that allow business users to 


prioritize their workloads and make statistically supported business decisions. 
See this article for details and thoughts on prioritization: https: //www.linked 
in.com/pulse/prioritization-hypothesis-darvish-shadravan/ 


Process Automation Einstein Discovery is connected to the flow core action 
which provides direct access to predictions, top predictors, and improvements 
within Salesforce Flows. This capability allows a Salesforce administrator to 
easily inject (no code) predictive modeling scores and insights directly into 
business processes for the CRM user. 


Figure 6: Einstein Discovery in Flows 
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Einstein Discovery supports the option to embed predictive intelligence in 
your process automation formulas. With the Einstein Discovery PREDICT 
function, flows and business decisions can be driven by automation logic based 
on predicted outcomes. For example, in an approval process, a formula can 
determine whether a predicted outcome meets a threshold required for automatic 
approval. The PREDICT function is available when defining formulas for Next 
Best Action, validation rules, flows (screen, headless, and invocable), processes 
(in Process Builder), workflow rules, approval processes, predefined field values, 
field update actions, and default values. 


Figure 7: Predict Function Driving Business Process 
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Integration with Operational Reporting Salesforce operational reports is 
the largest CRM reporting platform in existence. Because of its tightly coupled 
integration, Einstein Discovery for Reports is able to analyze Salesforce report 
data - quickly and thoroughly - using artificial intelligence and comprehensive 
statistical analysis. Einstein Discovery for Reports goes deep into the report 
data, explores underlying patterns, identifies insights, and surfaces those insights 
with charts and explanations that are easy to understand. Einstein Discovery 
for Reports works with Tabular and Summary reports in Salesforce. Visit 


https://trailhead.salesforce.com/content/learn/modules/einstein_d 
ata_insights_quick_look for more details. 
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Figure 8: Einstein Discovery for Reports 
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Actionability For machine learning tools designed for professional ML prac- 
titioners, injection into the business workflow is often an afterthought. Their 
main objectives commonly end at model creation. However, a primary driver of 
ROI for machine learning investments is when organizations can leverage model 
output at run time to improve business outcomes. 

Actionability in Einstein Discovery is a key differentiator from most ML 
tools. Actionable variables are variables that can be controlled, such as ” place 
customer on marketing journey” or ”arrange annual policy review”. The analyst 
creating the model can designate one or more variables as ”actionable”. Einstein 
Discovery applies prescriptive analytics on actionable variables to generate 
suggested actions that users can take at run time to improve the predicted 
outcome. 

From Beyondcore patent number 9,098,810 - Recommending changes to 
variables of a data set to impact a desired outcome of the data set: ”A method for 
recommending actions to affect an outcome of a process, the method comprising 
a computer system automatically performing the following: receiving from a user 
an identification of an outcome to be affected; processing a data set containing 
observations of the process, the observations expressed as values for a plurality 
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of variables and for the outcome, wherein processing the data set determines 
behaviors for different variable combinations with respect to the outcome, the 
variable combinations defined by values for one or more of the variables; receiving 
an identification of one or more actionable variables from the plurality of variables; 
for pairs of a first variable combination and a second variable combination, 
wherein the first and second variable combinations are the same except that one or 
more of the actionable variables take first values in the first variable combination 
and take different second values in the second variable combination, predicting 
an impact of changing the actionable variables in the first variable combination 
from the first values to the second values by applying (a) the behavior of the 
second variable combination to (b) a population of the first variable combination; 
and recommending actions to change actionable variables based on the predicted 
impacts.” 


3 Unique Einstein Discovery Capabilities at the 
Data Science Layer 


Summary The data science layer encompasses features and capabilities that 
allow Einstein Discovery to perform effective automated machine learning. This 
suite of capabilities includes augmenting, delegating, and offloading some of the 
more arduous and low-value tasks of data scientists. These features provide 
transparency for deep inspection of models built by users who are subject matter 
experts in the data but are not data scientists. Einstein Discovery has unique 
strengths in the following key areas: 


e Model Training and Assessment 


Explainability 
e Descriptive, Diagnostic, Predictive, and Prescriptive Analytics 


e Algorithms 


Data Preparation and Cleansing 
e Transparency and Trust 


e Data Security and Bias Protection 
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Model Training and Assessment In most data science projects, a common 
approach for model training and assessment is to randomly divide a dataset into 
three parts: 


e The training set is used to fit the model 


e The validation set is used to estimate the prediction error rate for model 
selection 


e The test set helps to assess model performance 


By default, Einstein Discovery simplifies these steps for the user with k-fold 
cross-validation technique by calling the H2O.ai splitFrame function. Behind the 
scenes, when a user creates a story, Einstein Discovery uses part of the respective 
dataset to fit the model, and a different part to test it. Einstein Discovery 
randomly partitions the data into four equal size datasets. After model creation, 
Einstein Discovery provides fit metrics for each of the four folds to quantitatively 
and visually reveal model performance with regard to over-fitting and other 
common modeling problems. 

In Einstein Discovery, the H2O.ai K-fold function parameter is set at 4 
(nfolds=4, so five models are built). The first four models (cross-validation 
models) are built on 80% of the training data, and the remaining 20% is held 
out for each of the four models. Then, the main model is built on 100% of 
the training data. Note that the analyst can modify the default 80/20 ratio if 
desired, as shown in the image below. 


Figure 9: Training/Validation ratio 
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The primary model contains the appropriate training and cross-validation 
metrics. All four of the cross-validation models contain training metrics (from 
the 80% training data) and validation metrics (from their 20% holdout data). For 
the main model, the four holdout predictions are combined into one prediction 
for the full training dataset. This “holdout prediction” is then scored, and the 
overall cross-validation metrics are computed. 


Explainability Historically, Machine Learning tools have been designed for 
data scientists and statisticians — not the business user who requires explainability 
and interpretability. Unlike other ML products, from its very beginning, Einstein 
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Discovery has always emphasized model interpretability as a core design principle. 
This key tenet has motivated the development team to use industry standard 
algorithms (described in ” Algorithms” later in this section) to produce highly 
explainable and easily interpretable output. Hence, Einstein Discovery generates 
explorable visualizations and rich, natural-language narrative explanations in 
business-friendly terms. 

With high model interpretability as a native capability, Einstein Discovery 
allows for the delegation of model creation (and resulting data analysis) to 
business analysts and other users who may not have data scientist skills. This is 
a key differentiator with Einstein Discovery. Within moments of generating a 
model on a dataset, the user sees a rich set of navigable insights derived from 
statistically-informed analysis and ML on a massive scale. Individuals who are 
subject matter experts in the data, but not necessarily data professionals, can 
rapidly create, iterate, interpret, and deploy predictive and prescriptive models. 

In one of the five key original patents for Beyondcore (Patent number 
9,129,226), the following statement was made regarding analysis of datasets by 
”inexpert” humans: ” A combined computer/human approach is used to detect 
actionable insights in large data sets. Automated computer analysis used to iden- 
tify patterns... These are presented to humans for feedback, where the humans 
may have little to no training in the statistical methods used to detect actionable 
insights.” 


Descriptive/Diagnostic/Predictive/Prescriptive It is rare for a machine 
learning product to produce output that encompasses the entire spectrum of 
major analytics functionality with explorable visualizations, natural language 
explanations, ”what if” capabilities, a comprehensive analysis of the dataset, 
predictions, and prescriptive suggestions on how to improve predicted outcomes 
based on any designated actionable variables. When surveying tools that can be 
used across a variety of personas, including some without statistical skills, these 
capabilities represent a key driver of ROI. 


Algorithms In order to provide best practice statistical analysis that is opti- 
mized for interpretability and explainability, Einstein Discovery uses industry- 
standard algorithms provided in H2O.ai, an algorithm library that is familiar to 
data scientists across industries. Einstein Discovery uses the following algorithms 
for model building and transformations: 


e GLM (Generalized Linear Model) - either a piecewise linear model with 
ridge regression for models with a numerical outcome variable, or a piece- 
wise logistic model with ridge regression for binary classification models. 


e XGBoost 
e Gradient Boosting Machine (GBM) 


e Random Forest 
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Multiclass - Einstein Discovery supports multiclass classification for out- 
come variables containing 10 or fewer classes. In a Multiclass model, Ein- 
stein Discovery uses Cramer’s V to measure the correlation and strength 
of association between variables. 


Holt Winters with additive trend and seasonality - Some models necessitate 
the usage of temporal values for trending and forecasting. Einstein Dis- 
covery allows the analyst to select a supplemental dataset which contains 
time-based variables that are used to predict future values for each row 
in the dataset. The user defines the time interval (day, week, month, or 
quarter), and the preferred number of time periods to project out in the 
future is also defined by the user at model creation time. If appropriate, 
seasonality can also be adjusted to a custom value. 


tf-idf and K-means - Einstein Discovery uses a combination of tf-idf and 
K-means clustering to provide insights on unstructured text fields. This is 
performed via the text clustering transformation on model features that are 
comprised of unstructured text such as customer comments, activity notes, 
surveys, etc. After the text is analyzed with tf-idf, the most important three 
terms are clustered into ten buckets which are use as predictor variables in 
the model as shown below. 
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Figure 10: Text Clustering on Unstructured Data 
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* For all decision-tree based algorithms, Einstein Discovery uses SHAPLEY 
values in order to provide a level of interpretability that is comparable to GLM- 
based models. 


* For any categorical variables used to create a model, Einstein Discovery 
automatically performs either One-Hot or Label encoding on behalf of the user. 
This auto-encoding happens for all algorithms supported by Einstein Discovery. 


For regression-based models, Einstein Discovery uses a combination of L1 
(Lasso) and L2 (Ridge) Regularization to minimize over-fitting. Optimal lambda 
A values are automatically determined and applied on behalf of the user. 


Figure 11: Ridge Regression Formula 
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Figure 12: Ridge Regression (L2 Regularization), shown below, seeks to improve 
on the Ordinary Least Squares (OLS) technique by imposing a penalty term 
to the cost function in order to prevent over-fitting and create a less complex 
model. 


b24 


Piecewise linear models are not single straight line linear graphs that many 
users are familiar with. In Einstein Discovery, the piecewise algorithm fits 
straight lines for each decile of a continuous variable. This approach provides 
the benefits of regression simplicity and simultaneously adds the mathematical 
prowess to handle datasets that are not truly linear. 


Figure 13: Typical two-part Piecewise Regression 
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Figure 14: Einstein Discovery Piecewise Regression 
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When a user creates a story (and model) from a CRM Analytics dataset, Ein- 
stein Discovery automatically determines and presents viable outcome variables 
from all columns in the dataset. The target outcome can be a numeric continuous 
variable (for example, customer lifetime value), or a binary classification value 
(for example, churn/no churn or yes/no), or a multiclass classification (up to 10 
classes). The user selects whether to minimize or maximize the outcome variable, 
such as “Minimize propensity to churn (true)”. 


Based on the selected outcome, Einstein Discovery automatically selects the 
appropriate type of modeling algorithms that can be applied to any given model. 
GLM is the default algorithm for both all models, but the user can alternately 
select any of the available algorithm options. In addition, the user can select 
”Model Tournament” to test all available algorithm types for the current model. 
The winning model is selected based on its respective performance metric (e.g. 
R? for GLM). 
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Figure 15: Story Creation Wizard Showing Predictor Variables With Correlation 
Percentage (Relative to the Outcome Variable) and Available Algorithms 
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Alerts, Data Preparation, and Cleansing To assist users who lack data 
science expertise, Einstein Discovery automatically detects and alerts users to 
common problems that arise in datasets targeted for predictive modeling. This 
empowers users who are ’untrained’ in statistical data methods to alleviate these 
issues and produce more accurate and reliable models. Data safeguards enable 
users to intervene and address the following issues that are commonly found in 
training data: 


e Multicollinearity Alert - Multiple variables that provide essentially the 
same information; removing duplicate variables simplifies the model while 
maintaining accuracy 


e Outliers - Observations with values that are unusually large or small, 
which affects calculated averages 


e Missing Values - Einstein Discovery will detect when a variable is missing 
a high percentage of values, which can lower the quality of your insights or 
model. For numerical variables, Einstein Discovery allows you to impute 
missing numerical values in your dataset by automatically replacing missing 
values with data derived from another subset of your data. 


e Strong Predictors - Variables that have a very strong correlation with 
the outcome variable; in certain cases, such variables should be excluded 


e Identical Values - All observations in a variable belong to the same 
category, which contributes no value to the analysis 
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Dominant Values - Most observations in a field are in the same category, 
which contributes little value to the model 


Disparate Impact - If recent story updates are affecting variables that 
the analyst has flagged as sensitive (having the potential for bias), Einstein 
Discovery may warn the user of disparate impact (in which variables are 
being treated unequally in the model) 


Imbalanced Data - In some binary classification scenarios, a training 
dataset will have a disproportionate ratio of observations in one class. This 
can lead to a number of potential problems in model creation. Einstein 
Discovery helps solve these issues by raising an alert anytime the imbalance 
is greater than a 85/15 ratio. Einstein Discovery applies a technique that 
appropriately re-weights each row by adding a ”weights column” to the 
training frame. 


Bucketing - Einstein Discovery attempts to optimize binning of the data 
on behalf of the user 


Cross Validation - If cross-validation quantitative tests fail, Einstein 


Discovery alerts the user 


A full list of data quality and model alerts is available here: 
//help.salesforce.com/s/articleView?language=en_ US&type=5&id= 


sf.bi_edd_quality_alert.htm 
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Figure 16: Einstein Discovery Data Validation 


Einstein Model Validation Selected Threshold: 0.5 Deploy Model 


oO Data Validation E Model 


© 5 instances of Duplicates detected © Cross validation test passed 
© O instances of Outliers detected 

(x) 1 instance of Strongest Predictors detected 

© O instances of Identical Values detected 

© 0 instances of Dominant Values detected 


@ O instances of Bias detected 


© 5 instances of Recommended Buckets detected 


Updating the 3 failed areas may help improve your model. 


Review Updates Model Evaluation 


Outliers (extreme values) are one of the most common problems in machine 
learning datasets. Because Einstein Discovery is sensitive to outliers, the product 
automatically identifies data values that are greater than five times standard 
deviation, and prompts the user to decide whether to exclude them from analysis. 
The user can also reduce outlier influence using transformations or converting a 
numeric variable to a categorical value with binning. 

Furthermore, Einstein Discovery identifies strong or ”obvious” predictors 
with an R? value greater than 0.3 so that the analyst can review them, determine 
whether they are valid features or problematic, and choose to exclude them from 
the model. Einstein Discovery helps users address other data issues, such as 
collinear values, excessive instances of null values, fields with only a single value, 
or fields with excessively high cardinality (too many unique values). 

For scenarios in which continuous value variables should be put into buckets, 
Einstein Discovery uses Kernel Density Estimation, an unsupervised learning 
algorithm, to analyze the numeric columns in your story and suggest appropriate 
bucketing (ranges) based on the actual distribution of the data. The default is 
ten buckets (deciles), although this can be adjusted by the user. This ”smart 
bucketing” approach identifies clusters of data values for each bucket (rather than 
pre-defining fixed bucket ranges). Kernel Density Estimation is a well-known 
and widely-accepted method for visualising the distribution of data. 
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Figure 17: Kernel Density Estimation Formula 


f(«; H) =n!) Kn(x- X;) 


i=1 


* Where K(x) is the kernel and H is the bandwidth matrix. 


Einstein Discovery approaches datasets with the notion that most business 
data is either implicitly or explicitly categorical, even when the data elements 
superficially appear to be numbers, dates, or text. This design approach allows 
Einstein Discovery (upon ingestion of the data) to divide all data into three 
distinct data types - categories, numbers, and dates. The user can explicitly 
choose the data type for each column, if they want, or they can let Einstein 
Discovery make assumptions about the data type based on the shape of the 
values. 

By default, every unique value in a text field gets its own bin if it occurs with 
at least 1.5% frequency in the dataset. Any remaining values are assigned to 
the special “Other” category. Users have the option to filter out any values they 
want to exclude from analysis. They can also transfer any binned categories into 
“Other,” ifthey want. They can manually choose to allow values rarer than 1.5% 
to hold their own bins, but only if those values are in the top hundred observed 
values. 

Variables with high cardinality (many unique values) can prove difficult 
to interpret and visualize. For this reason, Einstein Discovery ignores unique 
values above 100 in these variables or groups them into a reserve category. 
Einstein Discovery allows the user to bypass this restriction and enable one 
high cardinality variable to a story (a maximum of 200 unique values). Einstein 
Discovery automatically raises an alert when it detects variables containing more 
than 100 unique values. 

Numerical columns are automatically divided (using KDE) into categories 
by decile. This means that the bottom 10% of numbers get their own category, 
then the next lowest 10% get their own category, and so on. Users can adjust 
these buckets (bins) to a fixed width, or they can crop the range as desired. 
If a numeric column has fewer than 10 different values, Einstein Discovery 
automatically converts the data type to text so that values are analyzed as 
categories. Date elements are intelligently bucketed into either trend or periodic 
values, which an analyst can apply in a variety of typical temporal analyses. All 
of the aforementioned steps are simple options in the GUI. For the analyst, no 
coding or manual manipulation of the data is required. 
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Figure 18: Automatic Bucketing of a Continuous Numerical Variable 
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Einstein Discovery handles observations with empty or null values gracefully 
on behalf of the analyst. If the missing value is an outcome variable, Einstein 
Discovery omits that observation during analysis, does not factor it into averages, 
and excludes it from the story insights. If the missing value is an explanatory 
variable used in a model, Einstein Discovery generates a warning rather than 
a prediction. Also, Einstein Discovery can impute missing values for numeric 
variables. 


Einstein Discovery offers Numerical and Categorical Imputation, which 
allows the user to replace null values in observations with derived values. This 
ensures that observations are safely counted during analysis and that the model 
returns predictions instead of warnings. For example, suppose the annual 
policy amount column in an insurance agency dataset is missing many values. 
Imputation lets you replace those missing values with values derived from other 
data, such as the average policy amount by zip code. When creating imputed 
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values, the analyst can choose whether they prefer to use the average, median, 
or mode aggregation to calculate the derived values. 


A typographical clustering algorithm adds Fuzzy Matching capabilities to 
Einstein Discovery. For categorical data, differences in capitalization, plurals, 
abbreviations, and variations cause ambiguity about how to group the data. 
Fuzzy matching enables automatic matching for categorical values that should be 
in the same bucket (e.g. CIO and Cio). Einstein Discovery uses the Levenshtein 
Distance typographical clustering technique to smooth over spelling variations, 
resulting in smarter categorizations and better predictions. 


Einstein Discovery has no-code Sentiment Analysis capabilities that allows 
an analyst to rapidly derive insights from unstructured data. Comments, survey 
data, or other forms of customer feedback often contain some of the richest 
information in your datasets. Einstein Discovery analyzes unstructured data 
during story creation and categorizes sentiment as positive, negative, or neutral. 
For example, “Love our new van” is positive, “The dinner was OK” is neutral, 
and “A rather unpleasant experience” is negative. Previously, data like this 
was often regarded as unusable due to the cost, effort, and expertise required to 
derive sentiment from free-form text. 


Templates Einstein Discovery Story Templates offer an end-to-end workflow 
for rapid model creation powered by CRM Analytics Data Prep Recipes. Einstein 
Discovery Templates offer an out-of-the-box starter solution, enabling analysts 
to focus on customizing their implementation instead. The templates provide 
the quickest way to prepare, load, and analyze common business use cases with 
minimal clicks. In certain implementations, templates can replace the typical 
process of model deployment. They can also automate the calculation and 
creation of dataset prediction fields. 
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Figure 19: Einstein Discovery Templates 


What is the goal of your story? 


++ Create from Dataset [O Maximize Customer Revenue [M Maximize Win Rate 
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Einstein Discovery Sales Einstein Discovery Sales 


© Minimize Time To Close 


Understand the key factors that drive shorter deal cycles 
without having to worry about where to begin. 


Einstein Discovery Sales 


Transparency and Trust Many traditional data science and ML products 
provide model metrics to assess model quality and performance. Most products 
assume, however, that the data professional using the tool knows the types of 
performance measurements they are interested in, and where to find them. Like 
many aspects of predictive modeling, Einstein Discovery employs traditional tech- 
niques and extends their capabilities in an aggregated and highly approachable 
format for the non-statistician. 

Einstein Discovery’s Model Metrics section provides robust and relevant 
evaluation metrics to assess model quality and performance, including many 
of the most commonly used measures (Accuracy, R?, Precision/Recall, AIC, 
RMSE, MSE, and so on). A data scientist assisting in model validation can 
rely on various qualitative metrics and statistical reports and charts, including 
prediction examinations, cumulative capture charts, cross-validation metrics, a 
visual Confusion Matrix, an ROC/AUC curve chart, and a full coefficient listing. 
In addition, R code provides full transparency into the Einstein Discovery-created 
model’s transformation and scoring. 
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Figure 20: Model Metrics - Model Evaluation (Logistical Model) 


jy, Tableau CRM 
+ Analytics Studio 


Churn Prediction x 


Churn Prediction 


Model Metrics 
|| eas 


Prediction Examination 


Model Details 


Created Date 


August 14, 2020 7:39 AM 


Created By 
Sharda RAO 
Type 


Binomial 


Threshold 
0.48 


Algorithm 


GLM 


Compare to another model 


View/Copy R Code 


Minimize Is Churn + Last Updated Today at 2:41 PM 


Overview Predict Is Churn 


Got Feedback? 


Edit Story Y 


CP Learn more 


This analysis includes a predictive model, which means that Einstein has built a way to predict the value of Is_Churn based on other values in your data. Model 
metrics reveal quality measures and associated details for this classification model. 


Use the information in this section to evaluate this model's ability to predict Is Churn. When ready, deploy this model to Salesforce to predict ts Churn in 


production. 


Path to Deployment 


Review Model Accuracy 


How accurately does this model predict Is Churn in your 


dataset? 
AUC © 0.9748 
05 os 07 08 08 1 


Random Chance Perfect Prediction 


View Model Evaluation 


Training Data and the Model 


Distribution of Outcome Variable 


What is the distribution of observed values for Is_Churn in 
the dataset? 


Set a Threshold 


Use this tool ta set the threshold value for your 


classification model. 

Current Threshold © 0.4804 
o 0.4804 1 
False Threshold True 


View Threshold Evaluation 


Top Predictors 
What variables drive predictions for Is_Churn? 


ff Ready for Launch? 


Assess Deployment Readiness 


Review quality and data alerts to assess the model's 
readiness for deployment. 


Model Quality Check @ 
© No Major Issues Detected 


Data Alerts 
"5 


View All Alerts 


Figure 21: Model Metrics - Threshold Evaluation (Logistical Model) 
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An excellent and comprehensive overview of the model metric features available 


in Einstein Discovery is available here: https: //www.salesforceblogger.co 


m/2019/09/17/einstein-discovery-how-do-i-assess-the-quality-of-m 


Data Security and Bias Protection Unlike other traditional data science 
products and tools, Einstein Discovery makes no assumptions about a user’s 
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expertise in providing data privacy and security in an ML model. Einstein 
Discovery assists such users with multiple built-in mechanisms. 

To ensure individual privacy within models, Einstein Discovery uses K- 
anonymity, which is a well-known and highly regarded statistical technique. 
Einstein Discovery provides a built-in mechanism to exclude data that might 
compromise or identify an individual within a dataset, such as a government ID 
number or phone number. Einstein Discovery implements K-anonymity in such 
a way that any value contained in less than 24 rows is ignored, thereby ensuring 
that the model retains appropriate levels of anonymization and privacy. 

Einstein Discovery also helps users create accountable AI models that detect 
and flag potentially biased variables. Einstein Discovery implements the Sales- 
force ” Ethical AT” initiative, helping analysts prevent unintended statistical or 
ethical/societal bias. A user can designate a variable as a ” sensitive variable’ 
in order to selectively examine and optionally exclude specific data values from 
models. Einstein Discovery then notifies the user of problematic correlations. 
To detect strong correlation (collinearity) with the sensitive field, Einstein Dis- 
covery calculates a Cramer’s V Score and raises an alert for anything above .5. 
Furthermore, Einstein Discovery proactively detects disparate impact in any 
variables that are being treated unequally in the model, and then notifies the 
model creator. This alert allows analysts to easily remove disparate impact bias 
from predictions, resulting in more ethical and accountable models. 
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Model Cards Einstein Discovery provides model cards to help businesses 
document and communicate important usage information about predictions. 
A model card shows statistics associated with the data that’s used to train 
the model. It can also show any optional explanations about the prediction’s 
intended use, design assumptions, target audience, capabilities and limitations, 
and other relevant information. Disclosing these details helps users understand 
predictions, differentiate among multiple predictions, and make ethical, informed 
decisions about whether a prediction appropriately suits their use case. 
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Figure 22: Model Card 
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Bring Your Own Model (BYOM) Einstein Discovery enables data scientists 
to operationalize externally-created models into Salesforce. 
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Figure 23: Bring Your Own Model 


Upload a Model 


This wizard guides you through the steps to upload 
an external model to Salesforce. 


Enter model details 


* Model Name 


Outcome Predictor 


Description 


Externally created TensorFlow model 


Model Type 
@ Regression 


Binary Classification 
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TensorFlow 


From the Model Manager in Analytics Studio, a data scientist can upload and 
deploy externally created Python code and TensorFlow wrappers into Salesforce. 
Data scientists can design, build, test, and tune models using their own tools, 
and then operationalize them for Salesforce users. These External models can 
implement techniques that go beyond what Einstein Discovery supports natively 
(for example, Deep Learning) as long as it is a TensorFlow model (Python can 
easily be converted to TensorFlow models). For externally-built models, Einstein 
Discovery supports many of the same capabilities as natively-built models which 
reduces the deployment costs and implementation hurdles for teams tasked with 
deploying machine learning models into business operations at scale. 


4 Unique Einstein Discovery Capabilities at the 
ML Ops Layer 


Summary The machine learning operations layer provides capabilities that 
allow administrators and data scientists to easily inspect, refresh, and manage 
models over time. This includes the following capabilities: 
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Model Management and Monitoring 
Model Versioning 

Model Deployment 

Model Cards 


Model Management and Monitoring 


dll 


ie All Models 


Model monitoring tools provide real-time accuracy reporting. 


The Accuracy Analytics App provides Model Accuracy and Performance 
Dashboards. 


Model alerts provide subscription-based notifications for model performance 
drift as well as out-of-bounds and missing values that exceed configured 
thresholds. 


Automated model refreshes with a configurable schedule 
Side-by-side model comparison 

Configurable target thresholds for each logistic model. 
Residual Plot Chart for linear models 

Automatic snapshots of previous models 


Customizable evaluation order for predictions with multiple models (sup- 
ports targeted predictions for segmented data) 


Model Cards to document and disclose important usage information about 
your predictions to others 


Figure 24: Model Management 
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Model Versioning The individual(s) responsible for model management can 
track updates to models that result from changes in the data or improvements 
in story or model settings. The Einstein Discovery Model Manager displays a 
model’s version history so that the model owner can pinpoint exactly when it was 
updated and by whom, and whether it’s scheduled for an upcoming refresh job. 
For models that are not performing as expected, the user can easily revert to a 
previous model version with superior performance. To investigate the underlying 
settings associated with a particular model version, the user can easily retrieve 
and examine the specific story version on which it’s based. 


Figure 25: Model Versioning 
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Model Deployment For data science and machine learning projects, arguably 
the most challenging aspect of realizing sufficient return on investment is the 
effort and complexity required to operationalize models into production. The goal 
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is to deliver models into business operations so that end users can benefit from 
consuming predictions, targeting prioritized workloads, and acting on suggested 
improvements in real time. When Beyondcore became Einstein Discovery, it 
was integrated into Salesforce and engineered to natively interact with the 
existing Salesforce data platform (CRM Analytics). In addition to leveraging 
the enterprise features (actionability, security, etc.) mentioned previously, this 
integration critically allows for ” point and click deployment” of a predictive model 
with real-time-scoring and performance monitoring. Using Einstein Discovery, 
we have seen customers build and deploy a model into production for thousands 
of users in a matter of days and weeks, well ahead of the typical lengthy ML 
project cycle. 


Figure 26: Click-through Model Deployment 
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Figure 27: Deploy Model User Interface 
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5 Unique Einstein Discovery Capabilities at the 
Platform Layer 


Summary The data platform layer is a critical, but often under-appreciated, 
component of a machine learning solution. Einstein Discovery has the unique 
advantage of seamless integration in the Salesforce CRM Analytics SaaS environ- 
ment. Native embedding provides high performance, backend interoperability 
with the big data storage, connectivity, enterprise-ready security, and robust 
data management capabilities inherent in the Salesforce cloud service. 
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Figure 28: Einstein Discovery Logical Architecture 
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Data Platform Einstein Discovery is deeply integrated with CRM Analytics, 
which in turn runs natively within Salesforce data centers. In the Salesforce 
data centers, Einstein Discovery leverages Kubernetes and Spark capabilities 
to achieve maximum compute performance and scalable model building on- 
demand. Einstein Discovery uses CRM Analytics datasets to read data and 
store predictions, improvements, and top factors. Using the CRM Analytics 
framework, data engineers and admins can construct automated data integration 
solutions that interact with Einstein Discovery stories and models. Benefits 
include: 


e A robust ELT (extract, load, transform) toolset and workflow framework 


e Built-in connectors to most popular cloud data sources and live data 
connections to Snowflake (Pilot) 


e ML-based smart data transformations for sophisticated dataset data prep 
operations. These transformations utilize the scalable Spark platform. 
Examples include sentiment analysis, clustering, and typographic fuzzy 
matching. 


e Visual, code-free tools for building automated workflows and data prep 
pipelines 


e Big data scalability. CRM Analytics datasets can manage up to 2 billion 
rows by default. Einstein Discovery can analyze up to 20 million of those 
rows per story. 
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Public APIs, declarative elements, and programmatic access 


Scheduling tools that allow admins to specify when to run jobs that extract, 
load, and transform data 


Salesforce’s secure, well known, and trusted cloud platform 


e Sandbox environment with full datasets for testing and development, as 
well as the ability to package and promote tested models from sandbox to 
production environments 


Einstein Prediction Service Einstein Discovery delivers interoperability with 
applications that are external to the Salesforce ecosystem, providing predictions 
and improvements via Einstein Prediction Service. Einstein Discovery is designed 
to be agnostic regarding the data sources used to build a model. To support 
programmatic interaction with deployed models from external applications, the 
Einstein Prediction Service API enables users to programmatically get, create, 
and manage predictions and associated models. 

Developers can access Einstein Prediction Service APIs to embed predictions 
and descriptive and diagnostic insights into any website or application that is 
capable of consuming standard web service (REST) APIs. Professionals can 
leverage the power of Einstein Discovery beyond Salesforce use cases. Data 
scientists, data engineers, and business intelligence professionals can integrate 
the scale, speed, and features of Einstein Discovery into their existing data assets 
and applications. 


Einstein Discovery in Tableau Now that Tableau and CRM Analytics prod- 
ucts are under the same roof, no-code integration is available between Einstein 
Discovery and Tableau. Tableau users can get predictions and improvements for 
Tableau data using Einstein Discovery models that are deployed in Salesforce. 
The Einstein Discovery model derives a prediction from Tableau data and returns 
it to Tableau, where it appears in worksheets and dashboards at lightning speed. 


The options for integrating Einstein Discovery models into Tableau Version 
2021.1 or later include: 


e Connect to the Einstein Discovery analytics extension to interact with 
deployed Einstein Discovery-powered models from Tableau. 


e Embed predictions in a Tableau workbook by pasting generated table 
calculation scripts from Einstein Discovery into calculated fields in Tableau. 


e Give users dynamic, on-demand predictions based on their Tableau data 
using the Einstein Discovery dashboard extension. 
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e In Tableau Prep, add Einstein Discovery prediction steps to flows to enrich 
your flow output with predictions and, optionally, improvements and top 
factors. 


Figure 29: Einstein Predictions Surfaced in Tableau 
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Being able to easily inject ML predictions directly into a Tableau visualization 
with no code appeals to many analysts, users, and data science professionals 
who use Tableau daily. 


6 Resources and Further Reading 


For reference, training, and technical context, we provide the links below as a 
starting point. 


Popular Learning Resources: https: //salesforce.quip.com/FashAYUAL9VC 
https://www.tableau.com/products/add-ons/einstein-discovery 
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https://trailhead.salesforce.com/en/content/learn/modules/eins 
tein-discovery-quick-look 


https: //trailhead.salesforce.com/en/content/learn/modules/eins 


tein-discovery-quick-look 


ttps://www.salesforceblogger.com/2020/12/17/accelerating-pres 
criptive-analytics-using-einstein-discovery-templates/ 


ttps://www.salesforceblogger.com/2020/12/17/accelerating-pres 
criptive-analytics-using-einstein-discovery-templates/ 


https: //trailhead.salesforce.com/en/content/learn/trails/wave_a 
nalytics_einstein_discovery 


ttps://marktossell.com/2018/07/07/what-is-einstein-discovery 


and-what-can-it-do-for-your-business/ 
https://www.forcetalks.com/blog/advantages-of-salesforce-einst 


ein-discovery/ 


ttps://www.salesforceblogger.com/2019/09/17/einstein-discovery 
how-do-i-assess-the-quality-of-my-model/ 


ttps://www.salesforceblogger.com/2020/03/11/preparing-your-da 


ta-for-einstein-discovery/ 
https://www.salesforceblogger.com/2019/10/30/take-your-ed-mode 


1-from-good-to-great/ 


https: //www.linkedin.com/pulse/prioritization-hypothesis-darvi 


sh-shadravan/ 


https: //www.linkedin.com/pulse/einstein-discovery-your-data-sc 


ience-great-story-darvish-shadravan/ 
https: //www.salesforceblogger. com/2020/03/09/staying-focused-u 


sing-a-methodology-to-organize-your-thoughts-and-project-activit 


ttps://www.salesforceblogger.com/2020/02/17/what-kind-of-ques 
tions-can-einstein-analytics-stories-answer/ 
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ttps://www.salesforceblogger.com/2020/02/05/what-are-einstein 


analytics-stories-and-when-would-you-use-them/ 


https: //www.salesforceblogger.com/2018/01/16/salesforce-einste 


in-what/ 


https://c1.sfdcstatic.com/content/dam/web/en_us/www/assets/pdf/a 


ugmented-ai-guide.pdf 
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[= 
5 


ttps://www.salesforce.com/blog/2020/02/make-sure-data-ai-ready 
tml 


https://www.salesforce.com/blog/2020/03/build-trust-engage-use 


rs-artificial-intelligence.html 


Kernel Density Estimation - https: //www. jstatsoft.org/article/view/v 
021107 
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